Conversation
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 95 out of 100 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
maskfile.md (1)
102-166: Hardcoded service list and retry parameters remain.The retry logic has been improved with better status messages (lines 136-137, 157), but several issues from previous reviews remain:
- Service list hardcoded (line 118): Adding/removing services requires code changes
- Retry parameters not externally configurable: Variables exist (lines 122-124) but values are hardcoded
- Status check incomplete: Only verifies
ACTIVEstatus; doesn't confirmrunningCount >= desiredCountConsider these improvements:
🔎 Proposed refactor
+# Allow override via environment +SERVICES="${PSF_SERVICES:-pocketsizefund-datamanager pocketsizefund-portfoliomanager pocketsizefund-equitypricemodel}" +MAX_RETRIES="${PSF_MAX_RETRIES:-12}" +RETRY_WAIT_SECONDS="${PSF_RETRY_WAIT_SECONDS:-5}" + - for SERVICE in pocketsizefund-datamanager pocketsizefund-portfoliomanager pocketsizefund-equitypricemodel; do + for SERVICE in $SERVICES; do echo "Checking if $SERVICE exists and is ready" - # Wait up to 60 seconds for service to be active - RETRY_COUNT=0 - MAX_RETRIES=12 - RETRY_WAIT_SECONDS=5 - SERVICE_READY=false + RETRY_COUNT=0 + SERVICE_READY=false while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do - SERVICE_STATUS=$(aws ecs describe-services \ + SERVICE_INFO=$(aws ecs describe-services \ --cluster "$CLUSTER" \ --services "$SERVICE" \ - --query 'services[0].status' \ - --output text 2>/dev/null || echo "NONE") + --query 'services[0].[status,runningCount,desiredCount]' \ + --output text 2>/dev/null || echo "NONE 0 0") + + read -r STATUS RUNNING DESIRED <<< "$SERVICE_INFO" - if [ "$SERVICE_STATUS" = "ACTIVE" ]; then + if [ "$STATUS" = "ACTIVE" ] && [ "$RUNNING" -ge "$DESIRED" ] && [ "$DESIRED" -gt 0 ]; then SERVICE_READY=true - echo "Service $SERVICE is ACTIVE" + echo "Service $SERVICE is ACTIVE with $RUNNING/$DESIRED tasks running" break - elif [ "$SERVICE_STATUS" = "NONE" ]; then + elif [ "$STATUS" = "NONE" ]; then echo "Service not found, waiting ($((RETRY_COUNT + 1))/$MAX_RETRIES)" else - echo "Service status: $SERVICE_STATUS, waiting ($((RETRY_COUNT + 1))/$MAX_RETRIES)" + echo "Service status: $STATUS ($RUNNING/$DESIRED tasks), waiting ($((RETRY_COUNT + 1))/$MAX_RETRIES)" fi.github/workflows/launch_infrastructure.yaml (1)
27-50: Add error handling to prevent deployment with failed builds.The separate build and push steps for each service run independently without error propagation. If an early build or push fails, the workflow continues and may deploy infrastructure with stale or missing images. Each
flox/activate-actionstep should be configured to fail the workflow on error.Verify that the
flox/activate-actionproperly propagates exit codes, or add explicit verification steps between image operations and deployment:- name: Verify all images pushed run: | aws_account_id=$(aws sts get-caller-identity --query Account --output text) for image in datamanager portfoliomanager equitypricemodel; do if ! docker manifest inspect ${aws_account_id}.dkr.ecr.us-east-1.amazonaws.com/pocketsizefund/${image}-server:latest > /dev/null 2>&1; then echo "Error: Image ${image}-server:latest not found in ECR" exit 1 fi done
🧹 Nitpick comments (7)
tools/run_training_job.py (3)
13-13: Remove unusednoqadirective.The
noqa: PLR0913directive is unused according to Ruff, indicating that the "too many arguments" rule is not being triggered for this function. Removing unused directives keeps the code cleaner.🔎 Proposed fix
-def run_training_job( # noqa: PLR0913 +def run_training_job( application_name: str, aws_profile: str, image_uri: str,
34-41: Add structured logging before Estimator creation.Adding a log statement before creating the Estimator would improve observability and help with debugging if the Estimator construction fails (e.g., invalid IAM role, malformed image URI).
🔎 Proposed enhancement
+ logger.info( + "Creating estimator", + image_uri=image_uri, + instance_type="ml.t3.xlarge", + output_path=output_path, + ) + estimator = Estimator( image_uri=image_uri, role=sagemaker_role,
49-57: Enhance logging for training execution.The training job execution would benefit from more detailed structured logging to improve observability. Consider adding a log before starting the job and including additional context (S3 path, image URI, instance type) in the error log.
🔎 Proposed enhancement
try: + logger.info( + "Starting training job execution", + s3_data_path=s3_data_path, + image_uri=image_uri, + instance_type="ml.t3.xlarge", + ) estimator.fit({"train": training_data_input}) + logger.info("Training job completed successfully") except Exception as e: logger.exception( "Error during training job", error=f"{e}", application_name=application_name, + s3_data_path=s3_data_path, + image_uri=image_uri, ) raise RuntimeError from eapplications/equitypricemodel/src/equitypricemodel/tide_data.py (3)
29-30: Consider explicitly mentioning TiDE in the docstring.The docstring describes "Time-series dense encoder" but doesn't explicitly state "TiDE" or expand the acronym. For better discoverability and clarity, consider:
"""TiDE (Time-series Dense Encoder) data preprocessing and postprocessing."""🔎 Proposed enhancement
- """Time-series dense encoder data preprocessing and postprocessing.""" + """TiDE (Time-series Dense Encoder) data preprocessing and postprocessing."""
271-271: Remove unused noqa directives.Lines 271, 273, and 276 have
noqadirectives for rules that aren't enabled in the Ruff configuration (E501, C901). These should be removed to keep the code clean.🔎 Proposed fix
- "decoder_continuous_features": 0, # not using decoder_continuous_features for now # noqa: E501 + "decoder_continuous_features": 0, # not using decoder_continuous_features for now- "static_continuous_features": 0, # not using static_continuous_features for now # noqa: E501 + "static_continuous_features": 0, # not using static_continuous_features for now- def get_batches( # noqa: C901 + def get_batches(Also applies to: 273-273, 276-276
381-427: Use pathlib instead of os.path to align with coding guidelines and remove unused noqa directives.The save/load methods correctly create the target directory (the previous
os.path.dirnameissue is fixed ✅). However, the code usesos.pathwith multiple unusednoqadirectives (PTH103, PTH118, PTH123). Switching topathlib.Pathaligns with Python best practices and the project's coding guidelines, eliminating the need for these suppressions.🔎 Proposed refactor using pathlib
Add import at the top of the file:
+from pathlib import PathThen refactor the save method:
def save(self, directory_path: str) -> None: - os.makedirs(directory_path, exist_ok=True) # noqa: PTH103 + Path(directory_path).mkdir(parents=True, exist_ok=True) - with open(os.path.join(directory_path, "tide_data_mappings.json"), "w") as f: # noqa: PTH118, PTH123 + with (Path(directory_path) / "tide_data_mappings.json").open("w") as f: json.dump(self.mappings, f) - with open(os.path.join(directory_path, "tide_data_scaler.json"), "w") as f: # noqa: PTH118, PTH123 + with (Path(directory_path) / "tide_data_scaler.json").open("w") as f:And the load method:
@classmethod def load(cls, directory_path: str) -> "Data": data = cls() - with open(os.path.join(directory_path, "tide_data_mappings.json")) as f: # noqa: PTH118, PTH123 + with (Path(directory_path) / "tide_data_mappings.json").open() as f: data.mappings = json.load(f) - with open(os.path.join(directory_path, "tide_data_scaler.json")) as f: # noqa: PTH118, PTH123 + with (Path(directory_path) / "tide_data_scaler.json").open() as f: scaler_data = json.load(f)Based on coding guidelines and past review feedback.
infrastructure/__main__.py (1)
116-128: Single NAT gateway creates availability zone dependency.Using one NAT gateway in
public_subnet_1means all private subnet traffic egresses through availability zone A. If that AZ experiences issues, private tasks in AZ B lose internet connectivity.For production, consider adding a second NAT gateway in
public_subnet_2with a separate route table forprivate_subnet_2. For now, this is acceptable for cost optimization.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/launch_infrastructure.yaml.github/workflows/run_python_code_checks.yamlapplications/equitypricemodel/src/equitypricemodel/tide_data.pyinfrastructure/__main__.pymaskfile.mdtools/run_training_job.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff as the project formatter (replaces black/isort) for all Python code
Lint Python code with Ruff using the comprehensive ruleset
Use Structlog for structured logging across services
Files:
tools/run_training_job.pyinfrastructure/__main__.pyapplications/equitypricemodel/src/equitypricemodel/tide_data.py
**/*.{py,pyi}
📄 CodeRabbit inference engine (CLAUDE.md)
Type check with Pyright, using relaxed import resolution
Files:
tools/run_training_job.pyinfrastructure/__main__.pyapplications/equitypricemodel/src/equitypricemodel/tide_data.py
applications/*/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI consistently for web services in all applications
Files:
applications/equitypricemodel/src/equitypricemodel/tide_data.py
🧠 Learnings (2)
📚 Learning: 2025-09-16T16:54:51.913Z
Learnt from: CR
Repo: pocketsizefund/pocketsizefund PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-16T16:54:51.913Z
Learning: Applies to **/*.py : Lint Python code with Ruff using the comprehensive ruleset
Applied to files:
maskfile.md
📚 Learning: 2025-09-16T16:54:51.913Z
Learnt from: CR
Repo: pocketsizefund/pocketsizefund PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-16T16:54:51.913Z
Learning: Applies to **/*.py : Use Ruff as the project formatter (replaces black/isort) for all Python code
Applied to files:
maskfile.md
🧬 Code graph analysis (1)
infrastructure/__main__.py (1)
applications/portfoliomanager/src/portfoliomanager/server.py (1)
health_check(58-59)
🪛 Ruff (0.14.10)
tools/run_training_job.py
13-13: Unused noqa directive (non-enabled: PLR0913)
Remove unused noqa directive
(RUF100)
applications/equitypricemodel/src/equitypricemodel/tide_data.py
271-271: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
273-273: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
276-276: Unused noqa directive (non-enabled: C901)
Remove unused noqa directive
(RUF100)
382-382: Unused noqa directive (non-enabled: PTH103)
Remove unused noqa directive
(RUF100)
384-384: Unused noqa directive (non-enabled: PTH118, PTH123)
Remove unused noqa directive
(RUF100)
387-387: Unused noqa directive (non-enabled: PTH118, PTH123)
Remove unused noqa directive
(RUF100)
411-411: Unused noqa directive (non-enabled: PTH118, PTH123)
Remove unused noqa directive
(RUF100)
414-414: Unused noqa directive (non-enabled: PTH118, PTH123)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Rust code checks
🔇 Additional comments (29)
applications/equitypricemodel/src/equitypricemodel/tide_data.py (5)
1-8: LGTM!The additional imports (
json,os,numpy) are appropriate for the new persistence (save/load) and batching functionality.
139-140: LGTM!The preprocessing improvements are well-designed:
- Consistent uppercase normalization for categorical features (ticker, sector, industry)
- Filtering out UNKNOWN tickers prevents invalid data from entering the pipeline
- Filtering null/NaN daily_return is essential since it's the training target
- Final deduplication by ticker and timestamp ensures data quality
Also applies to: 174-189
317-377: LGTM!The refactored two-stage batching approach (collect samples → batch with numpy stacking) is cleaner and more maintainable than incremental per-tick processing. The tensor shapes are correct:
static_categorical_featuresis (batch_size, 1, 3) for the three static columns (ticker, sector, industry).
444-448: LGTM! Previous indexing issue resolved.The tensor indexing is now correct:
static_categorical_featureshas shape(batch_size, 1, 3)and the code correctly uses 3 indices[batch_idx, 0, 0]to extract the ticker value. The previous issue flagged in earlier reviews has been fixed. ✅
497-497: Verify that zero prices are acceptable in your domain.The validation changed from
greater_than(0)togreater_than_or_equal_to(0)for price columns. This is consistent with the preprocessing logic (lines 133-136) that fills null prices with0.0. Please confirm this aligns with your business logic—real market prices are rarely exactly zero, but this may be intentional for representing missing/holiday data.Also applies to: 501-501, 505-505, 509-509
infrastructure/__main__.py (10)
34-38: LGTM!The typo in the tag key has been corrected from
"manager:"to"manager".
155-230: LGTM!Security groups follow best practices: ALB accepts HTTP/HTTPS from the internet, ECS tasks only accept traffic from ALB on port 8080, and inter-service communication is properly allowed within the ECS security group.
357-401: LGTM! Path shadowing issue resolved.The priority swap (datamanager at 100, portfoliomanager at 200) ensures
/portfolios*routes to datamanager before the broader/portfolio*pattern matches.
470-496: LGTM! S3 policy construction is now correct.The bucket name is properly extracted from Secrets Manager and used to construct scoped IAM policies, addressing the previous concern about invalid ARNs.
571-629: Verify iftask_role_arnis intentionally omitted.Unlike
datamanager_task_definition(line 527), this task definition lackstask_role_arn. Without it, the portfoliomanager container cannot access AWS resources at runtime (e.g., S3, DynamoDB).If portfoliomanager only makes HTTP calls to other services and external APIs (Alpaca), this is fine. Otherwise, add:
task_role_arn=task_role.arn,
631-670: Verify iftask_role_arnis intentionally omitted.Same as portfoliomanager—this task definition lacks
task_role_arn. If equitypricemodel needs to access AWS resources (e.g., loading models from S3), it will fail at runtime.
672-706: LGTM! Service discovery names are correct.The service discovery names now match the expected DNS hostnames used in task definitions (e.g.,
datamanager.pocketsizefund.local), resolving the previous inter-service communication issue.
760-777: LGTM!The equitypricemodel service is correctly configured as an internal-only service—accessible via service discovery but not exposed through the ALB. This aligns with the architecture where only portfoliomanager calls it internally.
519-569: LGTM!The task definition family name typo has been corrected to
"datamanager", and both execution and task roles are properly assigned.
793-803: LGTM!Outputs properly export the key infrastructure identifiers for downstream use.
.github/workflows/launch_infrastructure.yaml (3)
9-11: LGTM! Concurrency control properly configured.The concurrency group prevents overlapping infrastructure deployments, and
cancel-in-progress: falseensures in-flight deployments complete. This addresses the concern from previous reviews.
20-24: LGTM! AWS credentials configured correctly.The AWS credentials are properly configured using OIDC role assumption before the deployment steps.
56-56: LGTM! Deployment command updated appropriately.The migration from
misetomaskaligns with the broader tooling changes in this PR.maskfile.md (10)
56-71: LGTM! Docker build command properly structured.The build command correctly:
- Uses
set -euo pipefailfor error handling- Specifies
linux/amd64platform for ECS compatibility- Tags for both local and ECR registries
- Retrieves AWS account ID dynamically
77-92: LGTM! ECR authentication syntax is correct.The ECR login pipeline (lines 84-87) now uses proper line continuation with backslashes, which addresses the syntax issue from previous reviews.
172-182: LGTM! Infrastructure teardown command is straightforward.The
downcommand properly changes to the infrastructure directory and runs Pulumi destroy.
246-339: LGTM! Rust development commands are well-structured.The Rust workflow commands properly use:
set -euo pipefailfor error handling- Standard cargo commands with appropriate flags
--workspaceflag for comprehensive testing (line 314)- Sequential execution in the
allcommand to run complete checks
349-357: LGTM! Python dependencies managed with uv.The install command properly uses
uv sync --all-packages --all-groupsfor comprehensive dependency installation.
359-371: LGTM! Ruff used as formatter.Correctly uses Ruff for formatting, which aligns with the project's coding standards. Based on learnings, Ruff is the project formatter replacing black/isort.
390-404: LGTM! Comprehensive Python linting with Ruff.Uses Ruff for linting with
--output-format=githubfor CI integration. Based on learnings, this implements the comprehensive Ruff ruleset for the project.
406-418: LGTM! Test coverage output path is consistent.Line 415 outputs coverage to
coverage/.python.xml, which correctly matches the path configured in.github/workflows/run_python_code_checks.yaml(line 22).
188-240: Python script path is correct.The script
tools/sync_equity_bars_data.pyexists and the command in line 220 will work as expected for the datamanager date-range invocation.
465-484: Artifact download script pattern is sound.The implementation correctly references
tools/download_model_artifacts.pyand mirrors the training command structure..github/workflows/run_python_code_checks.yaml (1)
22-22: The Coveralls action parameter change is correct.The parameter change from
path-to-lcovtofileis the proper v2 syntax—this is a valid upgrade from the v1 configuration. The pathcoverage/.python.xmlcorrectly reflects the Python coverage output format.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
libraries/python/tests/test_equity_bars_schema.py (1)
170-189: Add type assertion for transactions field.The test validates type coercion for several fields but omits the newly added
transactionsfield. For comprehensive coverage, add an assertion to verify that transactions are coerced to the expected type.🔎 Suggested enhancement
validated_df = equity_bars_schema.validate(data) assert validated_df["timestamp"].dtype == pl.Float64 assert validated_df["open_price"].dtype == pl.Float64 assert validated_df["high_price"].dtype == pl.Float64 assert validated_df["volume"].dtype == pl.Int64 + assert validated_df["transactions"].dtype == pl.Int64Additionally, consider testing type coercion for transactions by providing it as a string in the test data:
"volume": ["1000000"], # string that can be coerced to int "volume_weighted_average_price": [152.5], - "transactions": [100], + "transactions": ["100"], # string that can be coerced to int }pyproject.toml (1)
32-47: Fix incorrect absolute path in pytest rootdir configuration.Line 42 specifies
--rootdir=/tests, which is an absolute filesystem path that will likely fail. The--rootdiroption should either be omitted (pytest will auto-detect) or use a relative path.🔎 Recommended fix
addopts = [ "--verbose", "--tb=short", "--strict-markers", "--color=yes", - "--rootdir=/tests", ]Alternatively, if you need to specify the root directory explicitly:
- "--rootdir=/tests", + "--rootdir=.",Note on deprecation warnings: Lines 44-46 suppress all deprecation warnings. While this reduces noise, it may hide important compatibility issues. Consider being more selective if possible.
♻️ Duplicate comments (10)
applications/portfoliomanager/src/portfoliomanager/server.py (2)
258-262: Verify predictions endpoint uses parameterized queries.This function forwards
tickers_and_timestampsderived fromprior_portfolioto the datamanager/predictionsendpoint. Retrieved learnings indicate a known SQL injection vulnerability inapplications/datamanager/src/predictions.rswhere user-supplied tickers flow tostorage::query_predictions_dataframe_from_s3without validation, and SQL queries are built via string concatenation.Confirm that the current datamanager
/predictionshandler uses parameterized queries or proper escaping for both tickers and timestamps to prevent exploitation.Based on learnings, this vulnerability was acknowledged but deferred in PR #635.
#!/bin/bash # Description: Check if datamanager predictions endpoint uses parameterized queries # Expected: Find SQL query construction patterns in predictions handling # Search for SQL query construction in datamanager predictions module rg -nP --type=rust -C5 'query_predictions|SELECT.*FROM.*WHERE' applications/datamanager/
239-239: Fixtyping.castsyntax error.The first argument to
castmust be a type, not a string. Line 239 passes"float"(a string literal), which will confuse Pyright and other type checkers.🔎 Proposed fix
- start_timestamp = datetime.fromtimestamp(cast("float", timestamps.min()), tz=UTC) + start_timestamp = datetime.fromtimestamp(cast(float, timestamps.min()), tz=UTC)As per coding guidelines, type check with Pyright.
maskfile.md (1)
458-461: Environment loading pattern is sound; .env availability tracked separately.The isolated subshell with
set -a/set +afor environment variable export is a good practice. As previously discussed, the.envfile will be provisioned via Floxhook.on-activate.applications/datamanager/src/storage.rs (1)
186-194: Ticker validation addresses SQL injection; error message exposure previously acknowledged.The ticker format validation effectively prevents SQL injection by allowing only alphanumeric characters, dots, and hyphens. The previous concern about the error message exposing the invalid ticker was acknowledged and deferred.
applications/equitypricemodel/src/equitypricemodel/tide_model.py (4)
1-15: LGTM—Structlog migration complete.The file now correctly uses Structlog for structured logging, resolving the previous concern about Loguru usage.
197-253: LGTM—Training method improvements applied.The
trainmethod now correctly:
- Preserves and restores
Tensor.trainingstate using try/finally (lines 204-251)- Handles empty batch lists defensively (lines 235-239)
- Uses Structlog for structured logging
These changes resolve previous review concerns.
255-274: LGTM—Validation method handles edge cases.The
validatemethod now defensively handles empty validation batch lists by returningNaNwith a warning (lines 270-272), preventingZeroDivisionError.
276-316: LGTM—Persistence methods corrected.The
savemethod now correctly creates the target directory (line 280) instead of its parent, resolving the previous directory creation bug.applications/equitypricemodel/src/equitypricemodel/tide_data.py (2)
29-30: LGTM—Docstring updated to reflect TiDE architecture.The class docstring now correctly references "Time-series dense encoder" instead of the previous "Temporal fusion transformer", aligning with the PR's model migration.
381-427: LGTM—Persistence methods correctly implemented; remove unusednoqadirectives.The
savemethod now correctly creates the target directory (line 382) instead of its parent, resolving the previous bug. Theloadmethod properly reconstructs the Data instance with scaler and mappings.However, Ruff flags several
noqapragmas as unused (lines 382, 384, 387, 411, 414). Consider removing these for cleaner code.🔎 Optional cleanup
- os.makedirs(directory_path, exist_ok=True) # noqa: PTH103 + os.makedirs(directory_path, exist_ok=True) - with open(os.path.join(directory_path, "tide_data_mappings.json"), "w") as f: # noqa: PTH118, PTH123 + with open(os.path.join(directory_path, "tide_data_mappings.json"), "w") as f: - with open(os.path.join(directory_path, "tide_data_scaler.json"), "w") as f: # noqa: PTH118, PTH123 + with open(os.path.join(directory_path, "tide_data_scaler.json"), "w") as f: @@ - with open(os.path.join(directory_path, "tide_data_mappings.json")) as f: # noqa: PTH118, PTH123 + with open(os.path.join(directory_path, "tide_data_mappings.json")) as f: - with open(os.path.join(directory_path, "tide_data_scaler.json")) as f: # noqa: PTH118, PTH123 + with open(os.path.join(directory_path, "tide_data_scaler.json")) as f:
🧹 Nitpick comments (11)
.gitignore (1)
9-9: Minor: Redundant coverage ignore patterns.Both
coverage/(line 9) and.coverage*(line 13) are now present in the file, which creates a small redundancy. The directory patterncoverage/and file pattern.coverage*overlap in functionality. Consider whether both patterns are necessary or if one can be consolidated.Also applies to: 13-13
tools/run_training_job.py (1)
13-20: Remove unusednoqadirective.Static analysis indicates
PLR0913is not enabled in the Ruff ruleset. The directive is unnecessary.🔎 Proposed fix
-def run_training_job( # noqa: PLR0913 +def run_training_job( application_name: str,tools/sync_equity_bars_data.py (1)
102-102: Remove unusednoqadirective.Static analysis indicates
PLR2004is not enabled in the Ruff ruleset. The directive is unnecessary.🔎 Proposed fix
- if status_code >= 400: # noqa: PLR2004 + if status_code >= 400:libraries/python/tests/test_equity_bars_schema.py (2)
192-208: Consider testing missing transactions column specifically.While this test validates missing column behavior for
open_price, it doesn't verify that the newtransactionsfield is properly required. Sincetransactionsis a newly added field (per the schema evolution), consider adding a dedicated test case to ensure missing transactions triggers the expected validation error.🔎 Suggested additional test
Add a test case specifically for missing transactions:
def test_equity_bars_schema_missing_transactions() -> None: data = pl.DataFrame( { "ticker": ["AAPL"], "timestamp": [1674000000.0], "open_price": [150.0], "high_price": [155.0], "low_price": [149.0], "close_price": [153.0], "volume": [1000000], "volume_weighted_average_price": [152.5], # Missing transactions } ) with pytest.raises((SchemaError, pl.exceptions.ColumnNotFoundError)): equity_bars_schema.validate(data)
7-247: Add validation tests for the transactions field.The new
transactionsfield is consistently added across all test cases, but there are no dedicated tests validating its constraints (e.g., negative values, zero values, null values). Following the established test patterns for other fields likevolumeandprices, consider adding comprehensive validation tests fortransactions.🔎 Suggested additional tests
def test_equity_bars_schema_negative_transactions() -> None: data = pl.DataFrame( { "ticker": ["AAPL"], "timestamp": [1674000000.0], "open_price": [150.0], "high_price": [155.0], "low_price": [149.0], "close_price": [153.0], "volume": [1000000], "volume_weighted_average_price": [152.5], "transactions": [-100], } ) with pytest.raises(SchemaError): equity_bars_schema.validate(data) def test_equity_bars_schema_zero_transactions() -> None: """Test that zero transactions is allowed (valid case with no trades).""" data = pl.DataFrame( { "ticker": ["AAPL"], "timestamp": [1674000000.0], "open_price": [150.0], "high_price": [155.0], "low_price": [149.0], "close_price": [153.0], "volume": [1000000], "volume_weighted_average_price": [152.5], "transactions": [0], } ) # Adjust expectation based on schema requirements validated_df = equity_bars_schema.validate(data) assert validated_df["transactions"][0] == 0 def test_equity_bars_schema_null_transactions() -> None: data = pl.DataFrame( { "ticker": ["AAPL"], "timestamp": [1674000000.0], "open_price": [150.0], "high_price": [155.0], "low_price": [149.0], "close_price": [153.0], "volume": [1000000], "volume_weighted_average_price": [152.5], "transactions": [None], } ) with pytest.raises(SchemaError): equity_bars_schema.validate(data)Note: Adjust the zero transactions test expectation based on whether the schema should allow zero values (which might be valid for periods with no trading activity).
applications/portfoliomanager/src/portfoliomanager/server.py (1)
63-63: Remove unusednoqadirective.Ruff correctly flags that the
noqadirective on line 63 is unnecessary since the codesPLR0911andC901are not enabled in the project's Ruff configuration.🔎 Proposed fix
-def create_portfolio() -> Response: # noqa: PLR0911, C901 +def create_portfolio() -> Response:As per coding guidelines, Ruff is the project linter.
applications/datamanager/src/data.rs (1)
104-109: Consider combining chainedwith_columnscalls for efficiency.The sequential
with_columnscalls work correctly but could be combined into a single call for slightly better performance:🔎 Proposed optimization
let portfolio_dataframe = portfolio_dataframe .lazy() - .with_columns([col("ticker").str().to_uppercase().alias("ticker")]) - .with_columns([col("side").str().to_uppercase().alias("side")]) - .with_columns([col("action").str().to_uppercase().alias("action")]) + .with_columns([ + col("ticker").str().to_uppercase().alias("ticker"), + col("side").str().to_uppercase().alias("side"), + col("action").str().to_uppercase().alias("action"), + ]) .collect()?;infrastructure/__main__.py (1)
357-376: Misleading comment on portfoliomanager rule.The comment on Line 361 states "Ensures that the more specific data manager paths take precedence" but is attached to the
portfoliomanager_rule. Since ALB evaluates rules in ascending priority order and datamanager has priority 100 (evaluated first), the comment describes datamanager's behavior, not portfoliomanager's.Move this comment to the datamanager_rule block (lines 378-401) or reword it to clarify the relationship.
applications/equitypricemodel/src/equitypricemodel/server.py (1)
31-125: Add structured logging to the prediction endpoint.The endpoint performs multiple data retrieval, transformation, and model inference steps without logging. Adding structured log statements at key stages (data fetch, preprocessing, inference, save) would improve observability and debugging.
As per coding guidelines, use Structlog for structured logging across services.
Example structured logging additions
import structlog logger = structlog.get_logger(__name__) @application.post("/predictions") def create_predictions() -> PredictionResponse: logger.info("Starting prediction generation") tide_model = Model.load(directory_path=".") logger.info("Model loaded successfully") # ... fetch data ... logger.info("Fetched equity data", bars_count=len(equity_bars_data), details_count=len(equity_details_data)) # ... preprocessing ... logger.info("Data preprocessing complete", consolidated_rows=len(data)) # ... inference ... logger.info("Generated predictions", batch_count=len(batches)) # ... save ... logger.info("Predictions saved successfully") return PredictionResponse(data=processed_predictions.to_dict())applications/equitypricemodel/src/equitypricemodel/tide_model.py (1)
95-95: Remove unusednoqadirectives flagged by Ruff.Ruff reports several
noqapragmas as unused because the corresponding rules (PLR0913, PTH103, PTH118, PTH123) are not enabled in the current configuration. Removing these pragmas will clean up the code without changing behavior.🔎 Lines to clean
- def __init__( # noqa: PLR0913 + def __init__( self, input_size: int, hidden_size: int = 128, @@ def save( self, directory_path: str, ) -> None: - os.makedirs(directory_path, exist_ok=True) # noqa: PTH103 + os.makedirs(directory_path, exist_ok=True) states = get_state_dict(self) - safe_save(states, os.path.join(directory_path, "tide_states.safetensor")) # noqa: PTH118 + safe_save(states, os.path.join(directory_path, "tide_states.safetensor")) parameters = { @@ - parameters_file_path = os.path.join(directory_path, "tide_parameters.json") # noqa: PTH118 - with open(parameters_file_path, "w") as parameters_file: # noqa: PTH123 + parameters_file_path = os.path.join(directory_path, "tide_parameters.json") + with open(parameters_file_path, "w") as parameters_file: json.dump(parameters, parameters_file) @classmethod def load( cls, directory_path: str, ) -> "Model": - states_file_path = os.path.join(directory_path, "tide_states.safetensor") # noqa: PTH118 + states_file_path = os.path.join(directory_path, "tide_states.safetensor") states = safe_load(states_file_path) - with open( # noqa: PTH123 - os.path.join(directory_path, "tide_parameters.json") # noqa: PTH118 + with open( + os.path.join(directory_path, "tide_parameters.json") ) as parameters_file: parameters = json.load(parameters_file)Also applies to: 280-280, 284-284, 296-297, 305-305, 307-308
applications/equitypricemodel/src/equitypricemodel/tide_data.py (1)
271-276: Remove unusednoqadirectives flagged by Ruff.Ruff reports unused
noqapragmas at lines 271, 273, and 276 because the corresponding rules (E501, C901) are not enabled. Removing these will clean up the code.🔎 Proposed cleanup
- "decoder_continuous_features": 0, # not using decoder_continuous_features for now # noqa: E501 + "decoder_continuous_features": 0, # not using decoder_continuous_features for now "static_categorical_features": len(self.static_categorical_columns), - "static_continuous_features": 0, # not using static_continuous_features for now # noqa: E501 + "static_continuous_features": 0, # not using static_continuous_features for now } - def get_batches( # noqa: C901 + def get_batches( self,
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (21)
.github/workflows/create_or_update_portfolio.yaml.github/workflows/launch_infrastructure.yaml.github/workflows/sync_data.yaml.gitignoreapplications/datamanager/src/data.rsapplications/datamanager/src/equity_details.rsapplications/datamanager/src/portfolios.rsapplications/datamanager/src/storage.rsapplications/equitypricemodel/src/equitypricemodel/equity_details_schema.pyapplications/equitypricemodel/src/equitypricemodel/server.pyapplications/equitypricemodel/src/equitypricemodel/tide_data.pyapplications/equitypricemodel/src/equitypricemodel/tide_model.pyapplications/equitypricemodel/tests/test_equity_details_schema.pyapplications/portfoliomanager/src/portfoliomanager/server.pyinfrastructure/__main__.pylibraries/python/src/internal/equity_bars_schema.pylibraries/python/tests/test_equity_bars_schema.pymaskfile.mdpyproject.tomltools/run_training_job.pytools/sync_equity_bars_data.py
✅ Files skipped from review due to trivial changes (1)
- applications/equitypricemodel/src/equitypricemodel/equity_details_schema.py
🚧 Files skipped from review as they are similar to previous changes (3)
- .github/workflows/launch_infrastructure.yaml
- .github/workflows/create_or_update_portfolio.yaml
- libraries/python/src/internal/equity_bars_schema.py
🧰 Additional context used
📓 Path-based instructions (6)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff as the project formatter (replaces black/isort) for all Python code
Lint Python code with Ruff using the comprehensive ruleset
Use Structlog for structured logging across services
Files:
tools/run_training_job.pyapplications/portfoliomanager/src/portfoliomanager/server.pyapplications/equitypricemodel/src/equitypricemodel/server.pyinfrastructure/__main__.pyapplications/equitypricemodel/src/equitypricemodel/tide_model.pytools/sync_equity_bars_data.pyapplications/equitypricemodel/src/equitypricemodel/tide_data.pyapplications/equitypricemodel/tests/test_equity_details_schema.pylibraries/python/tests/test_equity_bars_schema.py
**/*.{py,pyi}
📄 CodeRabbit inference engine (CLAUDE.md)
Type check with Pyright, using relaxed import resolution
Files:
tools/run_training_job.pyapplications/portfoliomanager/src/portfoliomanager/server.pyapplications/equitypricemodel/src/equitypricemodel/server.pyinfrastructure/__main__.pyapplications/equitypricemodel/src/equitypricemodel/tide_model.pytools/sync_equity_bars_data.pyapplications/equitypricemodel/src/equitypricemodel/tide_data.pyapplications/equitypricemodel/tests/test_equity_details_schema.pylibraries/python/tests/test_equity_bars_schema.py
applications/*/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI consistently for web services in all applications
Files:
applications/portfoliomanager/src/portfoliomanager/server.pyapplications/equitypricemodel/src/equitypricemodel/server.pyapplications/equitypricemodel/src/equitypricemodel/tide_model.pyapplications/equitypricemodel/src/equitypricemodel/tide_data.pyapplications/equitypricemodel/tests/test_equity_details_schema.py
applications/*/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use Pytest with strict configuration for tests
Files:
applications/equitypricemodel/tests/test_equity_details_schema.py
applications/*/tests/test_*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Name test files as test_.py under applications/*/tests
Files:
applications/equitypricemodel/tests/test_equity_details_schema.py
{pyproject.toml,applications/*/pyproject.toml,libraries/python/pyproject.toml}
📄 CodeRabbit inference engine (CLAUDE.md)
Enforce Python version 3.12.10 across all projects
Files:
pyproject.toml
🧠 Learnings (6)
📚 Learning: 2025-09-16T16:54:51.913Z
Learnt from: CR
Repo: pocketsizefund/pocketsizefund PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-16T16:54:51.913Z
Learning: Applies to **/*.py : Lint Python code with Ruff using the comprehensive ruleset
Applied to files:
maskfile.mdpyproject.toml
📚 Learning: 2025-09-16T16:54:51.913Z
Learnt from: CR
Repo: pocketsizefund/pocketsizefund PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-16T16:54:51.913Z
Learning: Applies to **/*.py : Use Ruff as the project formatter (replaces black/isort) for all Python code
Applied to files:
maskfile.mdpyproject.toml
📚 Learning: 2025-09-24T00:54:42.461Z
Learnt from: forstmeier
Repo: pocketsizefund/pocketsizefund PR: 635
File: applications/datamanager/src/predictions.rs:60-99
Timestamp: 2025-09-24T00:54:42.461Z
Learning: In applications/datamanager/src/predictions.rs, there is a known SQL injection vulnerability where user-supplied tickers from predictions_query flow to storage::query_predictions_dataframe_from_s3 without validation, and the storage function builds SQL queries via string concatenation. The user forstmeier acknowledged this issue but chose to skip fixing it in PR #635, deferring it for later.
Applied to files:
applications/portfoliomanager/src/portfoliomanager/server.pyapplications/datamanager/src/storage.rs
📚 Learning: 2025-09-16T16:54:51.913Z
Learnt from: CR
Repo: pocketsizefund/pocketsizefund PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-16T16:54:51.913Z
Learning: Applies to **/*.py : Use Structlog for structured logging across services
Applied to files:
applications/equitypricemodel/src/equitypricemodel/tide_model.py
📚 Learning: 2025-09-16T20:20:16.305Z
Learnt from: forstmeier
Repo: pocketsizefund/pocketsizefund PR: 624
File: applications/datamanager/src/routes/equity.rs:6-6
Timestamp: 2025-09-16T20:20:16.305Z
Learning: User forstmeier prefers to defer switching from `Option<Json<DateRangeQuery>>` to `Option<Query<DateRangeQuery>>` for GET endpoints in the pocketsizefund datamanager Rust application, even when it's technically more correct for RESTful APIs.
Applied to files:
applications/datamanager/src/portfolios.rs
📚 Learning: 2025-09-16T16:54:51.913Z
Learnt from: CR
Repo: pocketsizefund/pocketsizefund PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-16T16:54:51.913Z
Learning: Applies to {pyproject.toml,applications/*/pyproject.toml,libraries/python/pyproject.toml} : Enforce Python version 3.12.10 across all projects
Applied to files:
pyproject.toml
🧬 Code graph analysis (9)
applications/datamanager/src/equity_details.rs (1)
applications/datamanager/src/storage.rs (1)
read_equity_details_dataframe_from_s3(438-475)
applications/equitypricemodel/src/equitypricemodel/server.py (2)
applications/equitypricemodel/src/equitypricemodel/tide_data.py (5)
Data(29-482)load(409-427)preprocess_and_set_data(35-219)get_batches(276-379)postprocess_predictions(429-482)applications/equitypricemodel/src/equitypricemodel/tide_model.py (4)
Model(89-361)load(301-316)validate(255-274)predict(318-324)
infrastructure/__main__.py (1)
applications/portfoliomanager/src/portfoliomanager/server.py (1)
health_check(58-59)
tools/sync_equity_bars_data.py (1)
applications/datamanager/src/equity_bars.rs (1)
sync(87-228)
applications/equitypricemodel/src/equitypricemodel/tide_data.py (2)
libraries/python/src/internal/tft_dataset.py (4)
TFTDataset(26-392)postprocess_predictions(341-392)_get_prediction_data(240-249)Scaler(8-23)libraries/python/src/internal/tft_model.py (1)
TFTModel(37-270)
applications/datamanager/src/storage.rs (2)
applications/datamanager/src/data.rs (4)
create_equity_bar_dataframe(19-39)create_equity_details_dataframe(114-153)create_portfolio_dataframe(94-112)create_predictions_dataframe(50-83)applications/datamanager/src/equity_bars.rs (1)
query(53-106)
applications/datamanager/src/portfolios.rs (1)
applications/datamanager/src/storage.rs (1)
query_portfolio_dataframe_from_s3(345-436)
applications/equitypricemodel/tests/test_equity_details_schema.py (2)
applications/equitypricemodel/src/equitypricemodel/tide_model.py (1)
validate(255-274)libraries/python/tests/test_company_information.py (2)
test_company_information_schema_special_characters(211-221)test_company_information_schema_both_fields_uppercase_passes(43-53)
libraries/python/tests/test_equity_bars_schema.py (1)
libraries/python/tests/test_equity_bar.py (9)
test_equity_bar_schema_type_coercion(162-180)test_equity_bar_schema_multiple_rows(219-235)test_equity_bar_schema_ticker_uppercase_passes(43-58)test_equity_bar_schema_ticker_lowercase_fails(25-40)test_equity_bar_schema_negative_prices(97-123)test_equity_bar_schema_negative_timestamp(61-76)test_equity_bar_schema_negative_volume(144-159)test_equity_bar_schema_missing_required_column(183-198)test_equity_bar_schema_null_values(201-216)
🪛 Ruff (0.14.10)
tools/run_training_job.py
13-13: Unused noqa directive (non-enabled: PLR0913)
Remove unused noqa directive
(RUF100)
applications/portfoliomanager/src/portfoliomanager/server.py
63-63: Unused noqa directive (non-enabled: PLR0911, C901)
Remove unused noqa directive
(RUF100)
applications/equitypricemodel/src/equitypricemodel/tide_model.py
95-95: Unused noqa directive (non-enabled: PLR0913)
Remove unused noqa directive
(RUF100)
280-280: Unused noqa directive (non-enabled: PTH103)
Remove unused noqa directive
(RUF100)
284-284: Unused noqa directive (non-enabled: PTH118)
Remove unused noqa directive
(RUF100)
296-296: Unused noqa directive (non-enabled: PTH118)
Remove unused noqa directive
(RUF100)
297-297: Unused noqa directive (non-enabled: PTH123)
Remove unused noqa directive
(RUF100)
305-305: Unused noqa directive (non-enabled: PTH118)
Remove unused noqa directive
(RUF100)
307-307: Unused noqa directive (non-enabled: PTH123)
Remove unused noqa directive
(RUF100)
308-308: Unused noqa directive (non-enabled: PTH118)
Remove unused noqa directive
(RUF100)
tools/sync_equity_bars_data.py
102-102: Unused noqa directive (non-enabled: PLR2004)
Remove unused noqa directive
(RUF100)
applications/equitypricemodel/src/equitypricemodel/tide_data.py
271-271: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
273-273: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
276-276: Unused noqa directive (non-enabled: C901)
Remove unused noqa directive
(RUF100)
382-382: Unused noqa directive (non-enabled: PTH103)
Remove unused noqa directive
(RUF100)
384-384: Unused noqa directive (non-enabled: PTH118, PTH123)
Remove unused noqa directive
(RUF100)
387-387: Unused noqa directive (non-enabled: PTH118, PTH123)
Remove unused noqa directive
(RUF100)
411-411: Unused noqa directive (non-enabled: PTH118, PTH123)
Remove unused noqa directive
(RUF100)
414-414: Unused noqa directive (non-enabled: PTH118, PTH123)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Rust code checks
- GitHub Check: Run Python code checks
🔇 Additional comments (37)
.gitignore (1)
22-22: No action needed. The repository contains no JSON files, so the**/*.jsonpattern poses no risk of excluding essential configuration or metadata files.tools/run_training_job.py (1)
1-103: Well-structured implementation addressing previous review feedback.The script correctly uses
structlogfor logging, includes proper type hints, validates environment variables with specific error messages for missing ones, and has comprehensive error handling around AWS session creation and training job execution. The previous issues (loguru usage, missing type hints, typo inlogger.exception, vague error messages) have all been addressed.tools/sync_equity_bars_data.py (3)
12-56: Date validation and clamping logic is correct.The function properly uses
max()to clampstart_dateto the minimum allowed date,min()to clampend_dateto current date, and validatesstart_date <= end_dateafter clamping. Previous review concerns have been addressed.
59-71: Consider adding timeout and error handling context.The 60-second timeout is reasonable for this operation. The function returns the status code, allowing the caller to handle errors appropriately. The implementation is clean and focused.
73-113: Good rate-limiting implementation with appropriate logging.The 15-second delay respects Polygon's rate limits, and the structured logging provides good observability. Error handling for
RequestExceptionensures the loop continues processing remaining dates on transient failures.libraries/python/tests/test_equity_bars_schema.py (3)
1-4: LGTM! Import statement correctly updated.The import statement properly reflects the schema rename from
equity_bar_schematoequity_bars_schema, consistent with the broader API migration described in the PR.
7-23: Shape assertion correctly updated for new transactions column.The test properly includes the new
transactionsfield and updates the expected shape from(1, 8)to(1, 9). However, consider adding an assertion to validate the transactions field value or type, similar to how other fields are validated.
230-247: LGTM! Multiple rows test correctly updated.The test properly includes
transactionsvalues for all three rows and correctly updates the expected shape from(3, 8)to(3, 9).applications/portfoliomanager/src/portfoliomanager/server.py (4)
1-55: LGTM: Clean initialization with proper credential validation.The initialization section correctly validates Alpaca credentials before proceeding, uses structlog for structured logging per coding guidelines, and properly configures the FastAPI application. The defensive credential check prevents runtime failures downstream.
62-198: Excellent migration to POST with comprehensive error handling.The endpoint now correctly uses POST for state-changing operations and implements robust per-trade result tracking with appropriate HTTP status codes (200 for full success, 207 for partial failures). The structured error handling and logging provide good observability.
289-315: LGTM: Well-structured portfolio optimization flow.The function properly delegates computation to the risk management module, validates the result against the schema before persistence, and follows REST semantics for saving the portfolio. The error propagation via
raise_for_status()is appropriate.
318-362: LGTM: Trade side mappings are now correct.The function properly maps trade sides for both opening and closing positions:
- Closing LONG → SELL (line 336)
- Closing SHORT → BUY (line 338)
- Opening LONG → BUY (line 351)
- Opening SHORT → SELL (line 353)
The enum types are correctly used (not
.valuestrings), and the position determination logic properly filters by action.pyproject.toml (4)
21-27: Good practice: Explicit workspace members.Changing from wildcard-based workspace members to an explicit list improves clarity and prevents accidental inclusion of directories. The addition of "infrastructure" and "tools" aligns with the PR's infrastructure rebuild.
58-58: LGTM: Coverage output path simplified.The coverage output path has been cleaned up and standardized, making it more concise.
94-94: LGTM: Enhanced linting with PLR.Adding "PLR" (pylint refactor) rules enhances the already comprehensive Ruff ruleset, helping catch additional refactoring opportunities.
6-15: Verify that production dependencies at workspace root are intentional.Root-level
dependenciesin a workspace should typically contain only shared development tools, not production libraries like FastAPI, Uvicorn, SageMaker, NumPy, and TinyGrad. These dependencies should be declared in thepyproject.tomlof individual workspace members that actually use them. If this root configuration is intentional for shared infrastructure or tooling, confirm the use case and document it clearly.applications/equitypricemodel/tests/test_equity_details_schema.py (2)
3-3: LGTM! Import path correctly reflects schema migration.The import has been properly updated to reference the schema from its new public location in the equitypricemodel application.
7-221: Review comment is based on incorrect assumptions.The test duplication concern cannot be verified because the referenced library test file (
libraries/python/tests/test_company_information.py) does not exist in the repository. The test file imports from a local schema module (equitypricemodel.equity_details_schema), indicating the schema was migrated as an application-level component, not copied from an external library.The tests themselves are well-structured with clear naming conventions and comprehensive coverage of validation scenarios. While pytest's @pytest.mark.parametrize decorator allows executing the same test logic with various input values, eliminating redundant code and transforming repetitive test suites into streamlined, maintainable code, the current test organization is acceptable and parameterization is not required for this test suite.
Likely an incorrect or invalid review comment.
.github/workflows/sync_data.yaml (1)
1-21: Workflow structure looks good; mask command now exists.The workflow correctly sets up the Flox environment and invokes the datamanager service. The
mask infrastructure services invoke datamanagercommand is now defined in maskfile.md (lines 221-236), resolving the previous concern about a missing command.One minor note: the cron comment states "5:00 AM EST / 6:00 AM EDT" but these would be different times depending on daylight saving. Consider clarifying which timezone is intended or use a single reference like "5:00 AM ET".
applications/datamanager/src/portfolios.rs (1)
49-62: Clean implementation of optional timestamp query parameter.The
QueryParametersstruct and updatedgethandler correctly implement optional timestamp-based filtering. The pattern aligns with the codebase's approach for query parameters in GET endpoints.applications/datamanager/src/data.rs (1)
114-153: Well-structured CSV parsing with proper validation and transformation.The function correctly validates required columns before processing, handles errors with descriptive messages, and applies consistent transformations (uppercase + null filling). The implementation follows the existing patterns in the codebase.
maskfile.md (3)
36-40: GitHub CLI auth check now fails gracefully - previous issue addressed.The code now properly checks for GitHub CLI authentication and exits with a clear error message instead of attempting an interactive login that would hang in CI.
88-91: ECR login command syntax is now correct.The AWS ECR login command properly uses backslash continuations and piping, addressing the previous syntax issue.
122-167: Retry logic improved with parameterized wait and better logging.The retry implementation now uses
RETRY_WAIT_SECONDSfor configurability and includes clearer status logging. The hardcoded service list is acceptable for maintainability at this scale.applications/datamanager/src/storage.rs (1)
438-475: Well-implemented S3 CSV reader with proper error handling.The function follows established patterns in the codebase with comprehensive error handling at each step (S3 fetch, body read, UTF-8 conversion, DataFrame creation) and appropriate logging.
applications/datamanager/src/equity_details.rs (1)
11-56: Handler structure is clean and follows established patterns.The endpoint correctly fetches data from S3, transforms to CSV, sets appropriate headers, and handles errors with informative messages. The implementation aligns with other handlers in the datamanager service.
infrastructure/__main__.py (8)
564-622: LGTM — Correct use of least-privilege for task role.The portfoliomanager task definition correctly omits
task_role_arnbecause the service only communicates with other internal services over HTTP and does not require AWS SDK permissions. This follows the principle of least privilege.
753-770: Verify that equitypricemodel is intentionally private.The
equitypricemodel_serviceis configured without ALB integration (noload_balancersfield), making it accessible only via service discovery. This means it can only be called by other services within the VPC, not from the public internet.Confirm this matches your intended architecture—if equitypricemodel should be publicly accessible, add it to the ALB configuration with appropriate listener rules and a target group.
469-495: LGTM — S3 policy correctly retrieves bucket name from secret.The S3 policy construction has been properly fixed. It now fetches the secret value, extracts the bucket name from the JSON payload, and uses it to build valid S3 ARNs. The permissions (GetObject, PutObject, ListBucket) are appropriately scoped to the specific bucket.
428-447: LGTM — Secrets Manager permissions correctly added.The execution role now includes the required
secretsmanager:GetSecretValuepermission, properly scoped to the specific secret ARN. This resolves the issue where tasks would fail to start due to missing permissions.
665-699: LGTM — Service discovery DNS names now match expected URLs.The service discovery service names have been corrected to
"datamanager","portfoliomanager", and"equitypricemodel"(without prefix). This creates DNS records that match the URLs used in container environment variables, enabling correct inter-service communication.
518-562: LGTM — Task definition properly configured with secrets and roles.The datamanager task definition correctly:
- Uses the
secretsfield for sensitive credentials (avoiding plaintext in environment)- Specifies
task_role_arnto grant S3 access at runtime- Configures CloudWatch logging appropriately
34-38: LGTM — Tag typo corrected.The tag key has been fixed from
"manager:"to"manager", removing the erroneous trailing colon.
701-751: LGTM — Service dependencies correctly configured.The ECS services have appropriate dependency chains:
portfoliomanager_servicedepends ondatamanager_service(which it calls)equitypricemodel_servicedepends ondatamanager_service(which it calls)This ensures services start in the correct order and prevents startup failures due to missing dependencies.
applications/equitypricemodel/src/equitypricemodel/tide_data.py (2)
444-448: LGTM—Tensor indexing is correct for 3D shape.The indexing at line 446 correctly uses three indices
[batch_idx, 0, 0]to extract the ticker fromstatic_categorical_features, which has shape(batch_size, 1, 3)as constructed inget_batches(line 367-369). This properly accesses the first static categorical feature (ticker) for each batch sample.
497-512: Zero prices are intentional for missing days—design is documented.The schema validation allows zero prices (using
greater_than_or_equal_to(0)) with an inline comment explaining this is for missing days. This aligns with the preprocessing logic (lines 133-138) that fills null prices with0.0. Theis_holidayflag helps distinguish actual missing market days from trading days, making this a reasonable approach for gap handling.applications/equitypricemodel/src/equitypricemodel/server.py (1)
112-112: Critical bug: validating wrong variable—should validateprocessed_predictions.Line 112 validates
predictions(the full 7-day forecast) instead ofprocessed_predictions(filtered to only the 7th timestep). This mismatch means the schema check does not cover the actual data being saved and returned.🔎 Proposed fix
- processed_predictions = predictions_schema.validate(processed_predictions) + processed_predictions = predictions_schema.validate(processed_predictions)Wait, reviewing more carefully: line 112 shows
processed_predictions = predictions_schema.validate(processed_predictions). Let me re-examine the code...Actually, looking at the code again, line 112 is correct:
processed_predictions = predictions_schema.validate(processed_predictions). The past review comment seems to be outdated. Let me skip this.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 96 out of 101 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
maskfile.md:1
- The cron schedule '0 23 * * *' runs at 11:00 PM UTC, which is 6:00 PM EST during standard time but 7:00 PM EST during daylight saving time. Consider clarifying the timezone handling or adjusting for DST if the 6:00 PM EST time is critical.
# Pocket Size Fund Task Manager
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
tools/sync_equity_bars_data.py (2)
117-117: Remove unusednoqadirective.The
# noqa: PLR2004directive on line 117 is flagged by Ruff (RUF100) because thePLR2004rule is not enabled. Remove the directive—the literal3is clear and acceptable for argument count validation.🔎 Proposed fix
- if len(sys.argv) != 3: # noqa: PLR2004 + if len(sys.argv) != 3:Based on static analysis hints.
102-102: Remove unusednoqadirective.The
# noqa: PLR2004directive on line 102 is flagged by Ruff (RUF100) because thePLR2004rule is not enabled in your configuration. Remove the directive—the magic number400is clear and acceptable in this context.🔎 Proposed fix
- if status_code >= 400: # noqa: PLR2004 + if status_code >= 400:Based on static analysis hints.
applications/datamanager/src/storage.rs (2)
184-203: Ticker validation improves security but error message leaks input.The ticker validation (lines 186-194) correctly addresses the SQL injection concerns from previous reviews. However, the error message at line 192 exposes the invalid ticker value, which could aid security probing.
🔎 Proposed fix (from past review)
- return Err(Error::Other(format!("Invalid ticker format: {}", ticker))); + return Err(Error::Other("Invalid ticker format".to_string()));This matches the past comment from Copilot about avoiding leaking the invalid input value.
323-323: Inconsistent debug log format.Line 323 uses
"Executing export SQL:"while lines 224 and 418 use"Executing query SQL:". Maintaining consistent log message formats improves grep-ability and observability.🔎 Proposed fix
- debug!("Executing export SQL: {}", query); + debug!("Executing query SQL: {}", query);This matches the format used elsewhere in the file and aligns with the past Copilot review comment.
🧹 Nitpick comments (7)
tools/sync_equity_bars_data.py (2)
12-56: Consider adding descriptive messages to RuntimeError exceptions.The function raises
RuntimeErrorwithout custom messages on lines 17, 21, 36, and 54. While the logger captures context, adding explicit exception messages would improve debugging and make the call stack more informative for callers who catch these exceptions.🔎 Suggested improvements
try: date_range = json.loads(date_range_json) except json.JSONDecodeError as e: logger.exception("JSON decoding error", error=f"{e}") - raise RuntimeError from e + raise RuntimeError("Failed to decode date range JSON") from e if "start_date" not in date_range or "end_date" not in date_range: logger.error("Missing required date fields", date_range=date_range) - raise RuntimeError + raise RuntimeError("Missing required date fields: start_date and end_date") try: start_date = datetime.strptime(date_range["start_date"], "%Y-%m-%d").replace( tzinfo=UTC ) end_date = datetime.strptime(date_range["end_date"], "%Y-%m-%d").replace( tzinfo=UTC ) except ValueError as e: logger.exception( "Date parsing error", error=f"{e}", required_format="YYYY-MM-DD", ) - raise RuntimeError from e + raise RuntimeError("Failed to parse dates in YYYY-MM-DD format") from e # ... date clamping logic ... if start_date > end_date: logger.error( "Invalid date range after clamping", start_date=start_date.strftime("%Y-%m-%d"), end_date=end_date.strftime("%Y-%m-%d"), ) - raise RuntimeError + raise RuntimeError( + "Invalid date range: start_date is after end_date following two-year window clamping" + )
127-138: Consider simplifying empty string validation.The current implementation creates a dictionary solely for logging, then loops through both arguments. While the validation itself is valuable (preventing empty strings like
python script.py "" "{}"), the logic could be more concise.🔎 Alternative implementation
base_url = sys.argv[1] raw_date_range = sys.argv[2] - arguments = { - "base_url": base_url, - "raw_date_range": raw_date_range, - } - - for argument in [base_url, raw_date_range]: - if not argument: - logger.error( - "Missing required positional argument(s)", - **arguments, - ) - sys.exit(1) + if not base_url or not raw_date_range: + logger.error( + "Empty argument(s) provided", + base_url=base_url or "(empty)", + raw_date_range=raw_date_range or "(empty)", + ) + sys.exit(1)Alternatively, keep the current approach if you prefer the explicit dictionary—it's clear and works correctly.
applications/equitypricemodel/src/equitypricemodel/server.py (5)
31-38: Add error handling and logging for model loading.The endpoint lacks error handling and observability. If
Model.load()fails (missing files, corrupt state), the endpoint will return a generic 500 error. The hard-coded directory path"."also reduces testability.🔎 Proposed improvements
+logger = structlog.get_logger() + +MODEL_DIRECTORY = os.getenv("PSF_MODEL_DIRECTORY", ".") + @application.post("/predictions") def create_predictions() -> PredictionResponse: + logger.info("predictions.create.started") + + try: - tide_model = Model.load(directory_path=".") + tide_model = Model.load(directory_path=MODEL_DIRECTORY) + except Exception as e: + logger.error("predictions.model_load.failed", error=str(e)) + raiseBased on coding guidelines requiring Structlog for structured logging.
40-66: Past issues resolved; consider adding request logging.The previous issues have been correctly addressed:
raise_for_status()is now called after each request (lines 49, 56)pl.DataFrame()is correctly used instead ofpl.read_json()(line 64)Data.load()classmethod usage is correct (line 89, reviewed separately)However, consider adding structured logging for the HTTP requests to improve observability of external dependencies.
🔎 Optional logging additions
+ logger.info("predictions.fetch_equity_bars.started", + start_date=start_date.isoformat(), + end_date=end_date.isoformat()) equity_bars_response = requests.get( url=f"{DATAMANAGER_BASE_URL}/equity-bars", params={ "start_date": start_date.isoformat(), "end_date": end_date.isoformat(), }, timeout=60, ) equity_bars_response.raise_for_status() + logger.info("predictions.fetch_equity_bars.completed", + bytes=len(equity_bars_response.content))
85-91: Validate non-empty data before processing.After joining and selecting columns (line 85), the
dataDataFrame could be empty if no tickers match between equity_details and equity_bars. Processing empty data through the TiDE pipeline will likely fail in unexpected ways.🔎 Proposed validation
data = consolidated_data.select(retained_columns) + + if data.height == 0: + logger.error("predictions.data_consolidation.empty_result") + raise ValueError("No matching equity data found for prediction") current_timestamp = datetime.now(tz=UTC)
105-116: Validate filtered predictions are non-empty.After filtering to the 7th timestep (line 107),
processed_predictionscould be empty if the model output doesn't include that specific day. Attempting to save and return empty predictions may not be the intended behavior.🔎 Proposed validation
processed_predictions = predictions.filter( pl.col("timestamp") == int( processed_prediction_timestamp.replace( hour=0, minute=0, second=0, microsecond=0 ).timestamp() ) ) + + if processed_predictions.height == 0: + logger.warning("predictions.filter.no_results", + target_timestamp=processed_prediction_timestamp.isoformat()) + # Consider: should this be an error or return empty gracefully? processed_predictions = predictions_schema.validate(processed_predictions)
118-129: Add logging for prediction save operation.The save operation to the datamanager lacks logging, making it difficult to observe whether predictions were successfully persisted.
🔎 Proposed logging
+ logger.info("predictions.save.started", + count=processed_predictions.height, + timestamp=current_timestamp.isoformat()) save_predictions_response = requests.post( url=f"{DATAMANAGER_BASE_URL}/predictions", json={ "timestamp": current_timestamp.isoformat(), "data": processed_predictions.to_dicts(), }, timeout=60, ) save_predictions_response.raise_for_status() + logger.info("predictions.save.completed") return PredictionResponse(data=processed_predictions.to_dict())
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
applications/datamanager/src/storage.rsapplications/equitypricemodel/src/equitypricemodel/server.pytools/sync_equity_bars_data.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff as the project formatter (replaces black/isort) for all Python code
Lint Python code with Ruff using the comprehensive ruleset
Use Structlog for structured logging across services
Files:
applications/equitypricemodel/src/equitypricemodel/server.pytools/sync_equity_bars_data.py
**/*.{py,pyi}
📄 CodeRabbit inference engine (CLAUDE.md)
Type check with Pyright, using relaxed import resolution
Files:
applications/equitypricemodel/src/equitypricemodel/server.pytools/sync_equity_bars_data.py
applications/*/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI consistently for web services in all applications
Files:
applications/equitypricemodel/src/equitypricemodel/server.py
🧠 Learnings (1)
📚 Learning: 2025-09-24T00:54:42.461Z
Learnt from: forstmeier
Repo: pocketsizefund/pocketsizefund PR: 635
File: applications/datamanager/src/predictions.rs:60-99
Timestamp: 2025-09-24T00:54:42.461Z
Learning: In applications/datamanager/src/predictions.rs, there is a known SQL injection vulnerability where user-supplied tickers from predictions_query flow to storage::query_predictions_dataframe_from_s3 without validation, and the storage function builds SQL queries via string concatenation. The user forstmeier acknowledged this issue but chose to skip fixing it in PR #635, deferring it for later.
Applied to files:
applications/datamanager/src/storage.rs
🧬 Code graph analysis (2)
applications/datamanager/src/storage.rs (3)
applications/datamanager/src/data.rs (5)
create_equity_bar_dataframe(19-39)create_equity_details_dataframe(114-153)create_portfolio_dataframe(94-112)create_predictions_dataframe(50-83)Portfolio(85-90)applications/datamanager/src/predictions.rs (1)
query(57-96)applications/datamanager/src/equity_bars.rs (1)
query(53-106)
applications/equitypricemodel/src/equitypricemodel/server.py (3)
applications/equitypricemodel/src/equitypricemodel/preprocess.py (1)
filter_equity_bars(4-22)applications/equitypricemodel/src/equitypricemodel/tide_data.py (4)
Data(29-482)load(409-427)preprocess_and_set_data(35-219)postprocess_predictions(429-482)applications/equitypricemodel/src/equitypricemodel/tide_model.py (4)
Model(89-361)load(301-316)validate(255-274)predict(318-324)
🪛 Ruff (0.14.10)
tools/sync_equity_bars_data.py
102-102: Unused noqa directive (non-enabled: PLR2004)
Remove unused noqa directive
(RUF100)
117-117: Unused noqa directive (non-enabled: PLR2004)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Rust code checks
🔇 Additional comments (3)
applications/datamanager/src/storage.rs (3)
16-95: LGTM: Consistent timestamp-based API migration.The migration from date-based to timestamp-based parameters across all write functions is implemented consistently. Path derivation correctly extracts year/month/day from the timestamp for partitioned storage.
345-438: Past critical issue resolved: action column now included.The previous critical issue about the missing
actioncolumn has been correctly fixed:
- Line 372:
actionincluded in theSome(timestamp)branch- Line 408:
actionincluded in theNone(most recent) branch- Line 429: Portfolio mapping reads
actionfrom column index 4Both query branches now align with the Portfolio struct's expected schema.
440-477: LGTM: Well-structured new data reader with proper error handling.The new
read_equity_details_dataframe_from_s3function is well-implemented:
- Clear error handling with contextual messages
- Appropriate info-level logging for observability
- Correct integration with
create_equity_details_dataframe- Follows established patterns in the codebase
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
applications/equitypricemodel/src/equitypricemodel/server.py (1)
49-57: Narrow exception handling to specific exception types.Catching generic
Exceptionis too broad and may mask unexpected errors. Userequests.exceptions.RequestException(or more specific subclasses likerequests.exceptions.HTTPError,requests.exceptions.Timeout) to handle only request-related failures.Additionally, note that line 140 raises the exception while other error paths return HTTP 500 responses — consider returning a 500 status consistently instead of propagating the exception.
Details
🔎 Suggested improvements+import requests.exceptions- except Exception as e: + except requests.exceptions.RequestException as e: logger.exception( "Failed to fetch equity bars data", start_date=start_date.isoformat(), end_date=end_date.isoformat(), error=f"{e}", ) return Response(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR)Apply similar changes to the other two try-except blocks (lines 67-73 and 134-140). For consistency at line 140, return an error response instead of raising:
- except Exception as e: + except requests.exceptions.RequestException as e: logger.exception( "Failed to save predictions data", timestamp=current_timestamp.isoformat(), error=f"{e}", ) - raise + return Response(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR)Also applies to: 67-73, 134-140
infrastructure/__main__.py (1)
564-663: Consider adding task roles to portfoliomanager and equitypricemodel for consistency.While these services don't currently require AWS resource access (portfoliomanager only needs Alpaca API access via secrets, equitypricemodel only calls datamanager), adding task roles now (even with minimal/no permissions) would provide:
- Consistency with datamanager's configuration
- Easier future extensibility if AWS resource access is needed
- Clearer separation between execution permissions and runtime permissions
🔎 Example: Add task role to portfoliomanager
portfoliomanager_task_definition = aws.ecs.TaskDefinition( "portfoliomanager_task", family="portfoliomanager", cpu="256", memory="512", network_mode="awsvpc", requires_compatibilities=["FARGATE"], execution_role_arn=execution_role.arn, + task_role_arn=task_role.arn, container_definitions=pulumi.Output.all(Apply the same change to equitypricemodel_task_definition at line 631.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
applications/equitypricemodel/src/equitypricemodel/server.pyinfrastructure/__main__.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff as the project formatter (replaces black/isort) for all Python code
Lint Python code with Ruff using the comprehensive ruleset
Use Structlog for structured logging across services
Files:
infrastructure/__main__.pyapplications/equitypricemodel/src/equitypricemodel/server.py
**/*.{py,pyi}
📄 CodeRabbit inference engine (CLAUDE.md)
Type check with Pyright, using relaxed import resolution
Files:
infrastructure/__main__.pyapplications/equitypricemodel/src/equitypricemodel/server.py
applications/*/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI consistently for web services in all applications
Files:
applications/equitypricemodel/src/equitypricemodel/server.py
🧬 Code graph analysis (1)
applications/equitypricemodel/src/equitypricemodel/server.py (3)
applications/equitypricemodel/src/equitypricemodel/preprocess.py (1)
filter_equity_bars(4-22)applications/equitypricemodel/src/equitypricemodel/tide_data.py (4)
Data(29-482)load(409-427)preprocess_and_set_data(35-219)postprocess_predictions(429-482)applications/equitypricemodel/src/equitypricemodel/tide_model.py (4)
Model(89-361)load(301-316)predict(318-324)validate(255-274)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Rust code checks
🔇 Additional comments (10)
applications/equitypricemodel/src/equitypricemodel/server.py (2)
1-25: LGTM! Imports and setup follow best practices.The module-level model loading (line 25) correctly addresses the previous concern about loading the model on every request. Using structlog aligns with the coding guidelines for structured logging.
150-181: LGTM! Response parsing correctly implemented.The function properly:
- Reads Parquet from bytes (line 154)
- Uses
pl.DataFrame(equity_details_response.json())correctly (line 160, addressing the previous review issue)- Validates both datasets with schemas
- Joins and selects the required columns
Schema validation failures will naturally propagate as exceptions to the caller, which handles them appropriately.
infrastructure/__main__.py (8)
34-38: LGTM! Tag typo fixed.The tag dictionary is now correct with
"manager": "pulumi"(trailing colon removed). The overall setup with account/region retrieval, secret fetching, and ECR image references looks good.
40-230: LGTM! Solid networking foundation.The VPC setup follows AWS best practices:
- Proper subnet segmentation (public for ALB, private for ECS tasks) across two availability zones
- Correct routing (IGW for public, NAT for private)
- Security groups with appropriate ingress/egress rules (ALB accepts HTTP/HTTPS, ECS tasks accept 8080 from ALB and each other)
232-290: LGTM! Well-configured ECS and load balancing setup.The infrastructure components are properly configured:
- ECS cluster with Container Insights for observability
- Service Discovery namespace for DNS-based inter-service communication
- ALB with appropriate target groups and health check settings (path="/health", reasonable thresholds)
357-401: LGTM! ALB routing priorities correctly ordered.The listener rules now have the correct priority ordering (datamanager=100, portfoliomanager=200), ensuring that the more specific
/portfolios*pattern reaches datamanager before the broader/portfolio*pattern. The path patterns are well-defined and non-overlapping given the priority evaluation order.
403-495: LGTM! IAM roles properly configured with least-privilege policies.The execution and task roles are correctly implemented:
- Execution role has necessary permissions for pulling images and reading secrets from Secrets Manager (scoped to specific secret ARN)
- Task role properly fetches the S3 bucket name from Secrets Manager and constructs a scoped policy granting only required S3 actions on the specific bucket
- All policies follow least-privilege principles
518-563: LGTM! Datamanager task definition properly configured.The task definition correctly includes:
- Fixed family name ("datamanager")
- Task role ARN for S3 access
- Secrets properly configured to pull from Secrets Manager
- Appropriate resource allocation and logging configuration
665-770: LGTM! Service discovery and ECS services correctly configured.The service discovery names are now fixed to use short names ("datamanager", "portfoliomanager", "equitypricemodel"), which will create the correct DNS entries matching the expected URLs in container environment variables (e.g.,
http://datamanager.pocketsizefund.local:8080).The ECS services are properly configured with:
- Appropriate load balancer attachments for externally-facing services
- Service registry integration for DNS-based service discovery
- Correct dependency ordering (portfoliomanager and equitypricemodel depend on datamanager)
772-796: LGTM! Comprehensive and useful outputs.The exported outputs provide all necessary information for downstream usage:
- Infrastructure identifiers (account, VPC, cluster)
- Load balancer endpoints
- Service discovery namespace
- Container image URIs
- Formatted base URL and README
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 96 out of 101 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
applications/equitypricemodel/src/equitypricemodel/server.py (2)
38-86: Consider more specific exception handling for better error diagnostics.The try-except blocks currently catch generic
Exception, which can mask unexpected errors. Consider catching specific exception types:
- Lines 38-58 and 60-74: Catch
requests.exceptions.RequestExceptionfor HTTP-related failures- Lines 76-86: Catch specific exceptions from
parse_responses(e.g.,pl.exceptions.PolarsError,ValueError)This allows unexpected exceptions to propagate naturally while handling known failure modes explicitly.
🔎 Example refactor
try: equity_bars_response = requests.get( url=f"{DATAMANAGER_BASE_URL}/equity-bars", params={ "start_date": start_date.isoformat(), "end_date": end_date.isoformat(), }, timeout=60, ) equity_bars_response.raise_for_status() - except Exception as e: + except requests.exceptions.RequestException as e: logger.exception( "Failed to fetch equity bars data", start_date=start_date.isoformat(), end_date=end_date.isoformat(), error=f"{e}", ) return Response(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR)Apply similar changes to the other exception handlers.
145-148: Consider using JSONResponse for more idiomatic FastAPI code.While the current implementation correctly encodes the JSON response as bytes, using
JSONResponsewould be cleaner and more consistent with FastAPI conventions.🔎 Proposed refactor
Update the import:
-from fastapi import FastAPI, Response, status +from fastapi import FastAPI, Response, status +from fastapi.responses import JSONResponseSimplify the return statement:
- return Response( - content=json.dumps({"data": processed_predictions.to_dict()}).encode("utf-8"), - status_code=status.HTTP_200_OK, - ) + return JSONResponse( + content={"data": processed_predictions.to_dict()}, + status_code=status.HTTP_200_OK, + )
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
applications/equitypricemodel/src/equitypricemodel/server.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff as the project formatter (replaces black/isort) for all Python code
Lint Python code with Ruff using the comprehensive ruleset
Use Structlog for structured logging across services
Files:
applications/equitypricemodel/src/equitypricemodel/server.py
**/*.{py,pyi}
📄 CodeRabbit inference engine (CLAUDE.md)
Type check with Pyright, using relaxed import resolution
Files:
applications/equitypricemodel/src/equitypricemodel/server.py
applications/*/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI consistently for web services in all applications
Files:
applications/equitypricemodel/src/equitypricemodel/server.py
🧬 Code graph analysis (1)
applications/equitypricemodel/src/equitypricemodel/server.py (3)
applications/equitypricemodel/src/equitypricemodel/preprocess.py (1)
filter_equity_bars(4-22)applications/equitypricemodel/src/equitypricemodel/tide_data.py (2)
Data(29-482)postprocess_predictions(429-482)applications/equitypricemodel/src/equitypricemodel/tide_model.py (1)
predict(318-324)
🔇 Additional comments (5)
applications/equitypricemodel/src/equitypricemodel/server.py (5)
26-26: Good fix: Model loaded at application startup.Loading the model once at module level significantly improves performance compared to loading it on every request. This properly addresses the previous review feedback.
90-92: Correct use of Data.load() classmethod.The code properly calls
Data.load()as a classmethod and uses the returned instance, addressing previous review feedback. Loading the Data instance per request is appropriate since it holds request-specific preprocessed state.
94-98: Good addition: Empty batches validation.The empty batches check prevents the potential
IndexErrorwhen accessingbatches[-1]on line 101. This properly addresses previous review feedback with clear error logging.
155-167: Data parsing and consolidation looks correct.The function properly:
- Reads Parquet data from bytes (line 155)
- Creates DataFrame from JSON dict response (line 161) - correctly addresses past review feedback
- Validates both data sources with schemas
- Filters equity bars based on criteria
- Performs an inner join to consolidate
The inner join on "ticker" will naturally filter to only tickers present in both datasets.
48-48: HTTP error handling properly implemented.The
raise_for_status()calls after each HTTP request correctly address previous review feedback, ensuring that non-2xx responses are caught and logged with appropriate context.Also applies to: 66-66, 133-133
Overview
Changes
Comments
This is a broad rebuild of everything in the application.
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.